A big data approach to the ultra-fast prediction of DFT-calculated bond energies
نویسندگان
چکیده
BACKGROUND The rapid access to intrinsic physicochemical properties of molecules is highly desired for large scale chemical data mining explorations such as mass spectrum prediction in metabolomics, toxicity risk assessment and drug discovery. Large volumes of data are being produced by quantum chemistry calculations, which provide increasing accurate estimations of several properties, e.g. by Density Functional Theory (DFT), but are still too computationally expensive for those large scale uses. This work explores the possibility of using large amounts of data generated by DFT methods for thousands of molecular structures, extracting relevant molecular properties and applying machine learning (ML) algorithms to learn from the data. Once trained, these ML models can be applied to new structures to produce ultra-fast predictions. An approach is presented for homolytic bond dissociation energy (BDE). RESULTS Machine learning models were trained with a data set of >12,000 BDEs calculated by B3LYP/6-311++G(d,p)//DFTB. Descriptors were designed to encode atom types and connectivity in the 2D topological environment of the bonds. The best model, an Associative Neural Network (ASNN) based on 85 bond descriptors, was able to predict the BDE of 887 bonds in an independent test set (covering a range of 17.67-202.30 kcal/mol) with RMSD of 5.29 kcal/mol, mean absolute deviation of 3.35 kcal/mol, and R (2) = 0.953. The predictions were compared with semi-empirical PM6 calculations, and were found to be superior for all types of bonds in the data set, except for O-H, N-H, and N-N bonds. The B3LYP/6-311++G(d,p)//DFTB calculations can approach the higher-level calculations B3LYP/6-311++G(3df,2p)//B3LYP/6-31G(d,p) with an RMSD of 3.04 kcal/mol, which is less than the RMSD of ASNN (against both DFT methods). An experimental web service for on-line prediction of BDEs is available at http://joao.airesdesousa.com/bde. CONCLUSION Knowledge could be automatically extracted by machine learning techniques from a data set of calculated BDEs, providing ultra-fast access to accurate estimations of DFT-calculated BDEs. This demonstrates how to extract value from large volumes of data currently being produced by quantum chemistry calculations at an increasing speed mostly without human intervention. In this way, high-level theoretical quantum calculations can be used in large-scale applications that otherwise would not afford the intrinsic computational cost.
منابع مشابه
NBO analysis and theoretical thermodynamic study of (5,5) & (6,6) armchair carbon nanotubes via DFT method
In the present work, the structural and electronic properties, and conductivity of (5,5) and (6,6) Single Walled Carbon Nanotubes in the ground state have done by using the Hartree-Fock and density functional theory DFT-B3LYP/6-31G* level. Delocalization of charge density between the bonding or lone pair and antibonding orbitals calculated by NBO (natural bond orbital) analysis. These methods a...
متن کاملNBO analysis and theoretical thermodynamic study of (5,5) & (6,6) armchair carbon nanotubes via DFT method
In the present work, the structural and electronic properties, and conductivity of (5,5) and (6,6) Single Walled Carbon Nanotubes in the ground state have done by using the Hartree-Fock and density functional theory DFT-B3LYP/6-31G* level. Delocalization of charge density between the bonding or lone pair and antibonding orbitals calculated by NBO (natural bond orbital) analysis. These methods a...
متن کاملThe Interaction between Dopamine and Carbon Nanotube: A DFT and NBO Approach
The Density Functional Theory (DFT) and the Natural Bond Orbital (NBO) calculations basedmethod B3LYP/6-31G were carried out to study the interaction of Dopamine with carbon nanotube.The nanotube used in this study, includes 60 C atoms (6, 6) type. Relative and formation energies ofcompounds, Muliken charges, occupancy, the highest occupied molecular orbital (HOMO) and thelowest unoccupied mole...
متن کاملA Theoretical Study on Interaction between Nitrobenzene and Carbon Nanotube (A DFT approach)
The Density Functional Theory (DFT) and the Natural Bond Orbital (NBO) calculations basedmethod B3LYP/6-31G* were carried out to study the interaction of carbon nanotube (8,0) withnitrobenzene in two situations perpendicular and parallel. Formation energies of compounds,charges, the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular'orbital (LUMO) and the HOMO-LUMO ba...
متن کاملDensity Functional Theory (DFT), Structural Properties, Natural Band Orbital and Energy Studies of N-(2-Fluorophenyl)-2,6-dimethyl-1,3-dioxan-4-amine
In this paper,N-(2-Fluorophenyl)-2,6-dimethyl-1,3-dioxan-4-amine (C12H16FNO2) optimized geometries and frequencies of the stationary point and energies in the ground state using DFT (B3LYP) methods with 6-311Gbasis set. The calculated HOMO and LUMO energies also confirmed that the charge transfer occurred within the molecule. Bond length and bond angles values forC12H16FNO2 were calculated by u...
متن کامل